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Abstract 

The process comparing the empirical cumulative distribution function of the 
sample with a parametric estimate of the cumulative distribution function is known 
as the empirical process with estimated parameters and has been extensively em- 
ployed in the literature for goodness-of-fit testing. The simplest way to carry out 
such goodness-of-fit tests, especially in a multivariate setting, is to use a parametric 
bootstrap. Although very easy to implement, the parametric bootstrap can become 
very computationally expensive as the sample size, the number of parameters, or 
the dimension of the data increase. An alternative resampling technique based on 
a fast weighted bootstrap is proposed in this paper, and is studied both theoretically 
and empirically. The outcome of this work is a generic and computationally efficient 
multiplier goodness-of-fit procedure that can be used as a large-sample alternative 
to the parametric bootstrap. In order to approximately determine how large the 
sample size needs to be for the parametric and weighted bootstraps to have roughly 
equivalent powers, extensive Monte Carlo experiments are carried out in dimension 
one, two and three, and for models containing up to nine parameters. The com- 
putational gains resulting from the use of the proposed multiplier goodness-of-fit 
procedure are illustrated on trivariate financial data. A by-product of this work 
is a fast large-sample goodness-of-fit procedure for the bivariate and trivariate t 
distribution whose degrees of freedom are fixed. 

Key words and phrases: asymptotically linear estimator, empirical process, multi- 
plier central limit theorem, multivariate t distribution. 



1 Introduction 



Let M = 
(c.d.f.s) on 
Given a sample X\, 
interested in testing 



{Fq : 9 G O} be a parametric family of cumulative distribution functions 
M. d , where O is an open subset of MP, for some integers d > 1 and p > 1. 

d with common c.d.f. F, we are 



X„ of i.i.d. random vectors on 



H : F G M against H 1 : F ^ M 
using statistics based on the empirical process 

¥ n {x) = V^{F n {x)-Fe n {x)}, 



x G 



where 



FJx) 



n 

n / — ' 

i=l 



< X 



a; G 



is the empirical c.d.f. computed from the random sample X 1; . . . , X n , and is a para- 
metric estimator of F computed under the null hypothesis that F belongs to Ai, that is, 
under the assumption that there exists 9q G O such that F = Fg . The latter parametric 
estimator of F under H Q is obtained from an estimator 9 n of #0 based on Xi, . . . ,X n . 
The above problem has been extensively studied in the literature (see e.g. 



Kac, Kiefer, and Wolfowitz 1955 Sukhatme 1972 Durbin[ 1973 Stephens 



Darling 


1955| 


1976; 


K 


rmal- 



adze, 1981 Durbin, 1975) and is often referred to as the problem of goodness of fit when 
parameters are estimated. 

Two test statistics that are frequently used are the Cramer-von Mises statistic 



n 



{F n (x) - Fg n (x)} 2 dF dn (x) 



and the Kolmogorov-Smirnov statistic 



n sup \F n 



X 



(2) 



(3) 



Minor variations of these will also be considered in this work. 

One important and enduring issue regarding tests of goodness of fit based on Q con- 
cerns the computation of critical values or p- values for statistics derived from F n . Indeed, 
under classical regularity conditions that will be explicitly stated in the forthcoming sec- 
tion, the weak limit of the process given in ([!]) involves a drift term that typically makes 
goodness-of-fit tests based on ¥ n distribution-dependent. 



To solve this problem in the univariate case, Khmaladze (1981) proposed to use the 
theory of martingales to transform F n to an asymptotically distribution-free process. 
In the same context, Durbin (1973, 1975) investigated several approaches to compute 



approximate critical values for the Kolmogorov-Smirnov statistic. The martingale trans- 



form of Khmaladze ( 1981 ) and Durbin ( 1975 ) 's approach based on approximate boundary 



crossing probabilities are reviewed and compared in Parker (2010) when A4 is a location 



scale or a scale-shape univariate family. For tests based on the Cramer-von Mises, the 
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Anderson-Darling or the Watson statistics, Stephens (1976) (see also Sukhatme, 1972 



Stephens, 1974) used the fact that the asymptotic distributions of these statistics can be 
expressed as a weighted sum of xl variables and explained in detail how to compute the 
unknown weights when A4 is the univariate normal or the exponential distribution. 

A generic, very simple to implement resampling technique, that can be used to carry 
out tests of goodness of fit based on F n in a general multivariate setting, is the so-called 
parametric bootstrap (see e.g. Romano, 1988 Stute, Gonzales Manteiga, and Presedo 



Quindimil, 1993 Jogesh Babu and Rao 2004 



Genest and Remillard 2008). To fix ideas 



let S n be the test statistic and assume that it is a continuous functional of the empirical 
process F n . We shall thus write S n = 0(F n ). For some large integer N and given an 
estimator 9 n of 6q based on X\, . .. ,X n , the parametric bootstrap consists of repeating 
the following steps for every k £ {1, . . . , N}: 

(a) Generate a random sample x[ k \ . . . , X n k ^ from c.d.f. Fg n . 

(k) (k) 

(b) Let Fn and 9 n stand for the versions of F n and 6 n estimated from the random 
sample x[ k \ . . . , X {k) . 



(c) Form an approximate realization of the test statistic under H as Sffi 
where F^ fc) = ^(F^ - F. w ). 



(F 



n ), 



With the convention that large values of S n lead to the rejection of Hq, an approximate 
p- value for the test is finally given by iV -1 Yuk=i 10$" — S n ). 

When n, p or d are large, the above procedure can become very computationally 
expensive as, for every k G {1, . . . , N}, it requires the generation of a random sample 
from Fg n and the estimation of 6 from the generated data, both steps being potentially 
very time-consuming. 

A computationally more efficient approach consists of using a weighted bootstrap in the 



sense of Burke (2000) and Horvath, Kokoszka, and Steinebach (2000) (see also Horvath 



2000, and the references therein). This resampling technique, based on the multiplier 
central limit theorem for empirical processes (see e.g. van der Vaart and Wellner, 2000 



Kosorok 




2008 


), was recently used for 


Kojadinovic, Yan, and Holmes 


( 


2011) 



While being asymptotically equivalent to the 
parametric bootstrap under the null hypothesis, it was found that, in the case of large 
samples, the use of the weighted instead of the parametric bootstrap could reduce the 
computing time from about a day to minutes for certain multivariate multiparameter 
models. 

The aim of this paper is to investigate, both theoretically and empirically, goodness- 
of-fit tests for multivariate distributions based on a weighted bootstrap in the sense of 
Burke (2000). From a practical perspective, a large scale Monte Carlo study was carried 



out in order to approximately determine the sample size from which the parametric and 
the weighted bootstrap have roughly the same power. For small samples, the parametric 
bootstrap usually appears more powerful, and, since it typically has an acceptable com- 
putational cost in that case, it is the recommended approach. For larger samples, the use 
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of the parametric bootstrap can become very tedious in practice and the derived faster 
multiplier goodness-of-fit procedure appears as a natural alternative. 

The paper is organized as follows. In the second section, to extend the breadth of the 
approach, the theoretical results establishing the validity of the approach are stated in 
the context of the theory of empirical processes as presented for instance in van der Vaart| 



and Wellner (2000). The results of a large-scale simulation study are partially reported 



in the third section for univariate, bivariate and trivariate data sets and parametric c.d.f. 
families with up to nine parameters. The last section is devoted to a detailed illustration 
on real financial data. All the proofs and computational details are relegated to the 
appendices. 

Note finally that the code of all the tests studied in this work will be documented and 
released as an R package accompanying the paper. 



2 The weighted bootstrap for goodness-of-fit testing 



To extend the breadth of our study, we shall work in the framework of the theory of 



empirical processes as presented for instance in van der Vaart and Wellner (2000) or 



Kosorok ( 2008 ) . Given a random sample X\ , . 
d , the empirical measure is defined to be P„ 



on 



X n from a probability distribution P 
Sr=i , where 5 X is the measure 



n 



-i 



that assigns a mass of 1 at x and zero elsewhere. For a measurable function / : R 
F n f denotes the expectation of / under P n , and Pf the expectation under P, i.e., 



— > 



1 n 

n ' 



and Pf 



fdP 



The empirical process evaluated at / is then defined as G n f = </n(F n f — Pf). 

As we continue, P denotes a P-Donsker class of measurable functions, which means 
that the sequence of processes {G n f : / G P} converges weakly to a P-Brownian bridge 
{Gpf : / G P} in the space £°°(P) of bounded functions from P to R equipped with 



the uniform metric in the sense of Definition 1.3.3 of van der Vaart and Wellner (2000). 



Following usual notational conventions, this weak convergence will simply be denoted by 
G„ -w Gp in £°°(P). By taking P to be the class of indicator functions of lower- left 

orthants in R d , i.e., P = {x y l(x < t) : t G R } with E = MU {— oo, oo}, one recovers 
a more classical version of Donsker's theorem stating that ^/n(F n — P), where P is the 

c.d.f. associated with P, converges weakly in the space £°°(M. ) of bounded functions on 
R to an P-Brownian bridge 0, i.e., a tight centered Gaussian process with covariance 
function E{f3(x)f3(y)} = F(x Ay)- F(x)F(y), x, y G R d . 

The advantage of working in this general framework is that the forthcoming results 
remain valid for any P-Donsker class P. Although the collection of all indicator functions 
of lower-left orthants in R d may appear as a natural choice for P in the goodness-of-fit 
framework under consideration, other choices might be of interest such as the class of 



indicator functions of closed balls, rectangles or half-spaces (see Romano, 1988, for a 
related discussion regarding the choice of P). 
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In the rest of the paper, convergence in probability is to be understood in outer 
probability (see e.g. van der Vaart, 1998, Chapter 18). 



2.1 Theoretical results 

Given i.i.d. random variables Z\, . . . ,Z n with mean 0, variance 1, satisfying J °° {P(|Zi| > 
x)} l l 2 (\.x < oo, and independent of the random sample X 1; . . . ,X n , the following multi- 
plier versions of G n will be of interest: 

1 n n 
G « = 7^5>^- P ) and G n = 7=E^-^ W 

where Z = n _1 Y^i=i 

Let {Pg : G O} be an identifiable family of distributions, where O is an open subset 
of MP, and let {ipg : G (9} be a class of measurable functions from M. d to M. p . Assume 
additionally that 

(Al) for any 6> G O, the map # i-> Pg from IR P to ^(J 7 ) is Frechet differentiable at #o, 
i.e, there exists a map Pg : J 7 — > M. p such that 

sup |P e / - PgJ -{9- e ) T PgJ\ = o{\\9 -6 \\) as 9^ e , 
(A2) for any 9 G O, 

SUp \\Pg f -PgJ\\ =0(1) as fl^ # , 

/e-F 

(A3) for any # G (9, there exists a 5 > such that the class of measurable functions 
{ijjg : || - 9 II < 5} is P-Donsker, 

(A4) for any 9 G 0, 

/ \\Mx) -^„(^)|| 2 dP(a;) = o(l) as 0„, 

(A5) for any 9 £ O and a random sample X%, . . . , X n from Pg , 9 n is an estimator of 9$ 
that is asymptotically linear with influence function ipg , i.e., 

1 n 

v^(0 n = ^ (x.) + OPeo (i), 

V 1=1 

with Pg ipg Q = and P 0O ||^ O || 2 < oo. 

The following two propositions, proved in Appendix [A] are at the root of the goodness- 
of-fit procedure to be given in the next subsection. 
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Proposition 1. Let X\, . . . ,X n be a random sample from the distribution Pg for some 
60 G O. If Assumptions (A1)-(A5) are satisfied, then 

(/ h. v^(P n / - P e J), f ^ G'J - Q'JlPeJ, f^G'J- K^lKf) 

converges weakly to 

(/ 1 v GgJ - Ge^lPoJ, f ^ G' e J - G> e j] P e J, f ^ G' e J - G' e jJ P e j) 

in {^(J 7 )} 3 , where Go , a Pg -Brownian bridge, is the weak limit of G n , and G' do is an 
independent copy of Gg . 

Proposition 2. Let X%, . . . ,X n be a random sample from a distribution P ^ {Pg : 
9 G O}. If Assumptions (Al)-(A4) are satisfied, and if there exists 9q G O such that 
\/n(6 n — 9q) converges in distribution under P, then 

sup|>/^(P n /-P fln /)|4oo 

while 

(/ -> G'J - G' n ^J n PeJ, f ^ G'J - GyJ n Poj) 

converges weakly to 

(/ 1 y G'pf - G' P ^lPeJ, f » G' P f - G^Pej) 
in {^(J 7 )} 2 , where G' P , a P-Brownian bridge, is the weak limit of G' n . 

2.2 The goodness-of-fit procedure 

Let us reformulate the results stated in Propositions [T] and [2] when J 7 is the class of 
indicator functions of lower-left orthants in IR d . As the empirical process G' n defined 
in Q depends on the unknown true distribution P, we shall only deal with the parts of 
the above results concerned with G'^. Thus, let 

1 - 

r » = 77^ !D Z « " ^ ~ V&TO^0»0}> x e ^ ( 5 ) 

" i=l 

where {Fg : 9 G O} = Ai is the set of c.d.f.s associated with the parametric family of 
distributions {Pg : 9 G O}, and where, for any x G M. d , Fg(x) is the gradient of Fg(x) 
with respect to 9. Proposition [T] then states that, if P = Pg for some 9q G O, then, 
under explicit regularity conditions, F n and F" jointly converge weakly in )} 2 
to independent copies of the same limit. Roughly speaking, under the null hypothesis 
H Q : F G M., where F is the c.d.f. associated to P, the empirical process F^ is close to 
being an independent copy of F n . Every new set of multipliers Z\ . . . , Z n gives a new 
approximate independent copy of F n . 
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Proposition [2] is concerned with the behavior of F n and F" under the alternative 
hypothesis Hi : F (jL Ai. If F ^ {Fq : 9 G O}, then, under explicit regularity condi- 
tions, any sensible statistic derived from F n will tend to infinity in probability because 
sup^gjjd |F n (x)| — > oo, while F" still converges weakly. 

The verification of Assumptions (A1)-(A5) for a given family Ai of c.d.f.s is discussed 
in Section [2751 

Now, for illustration purposes, let us assume that the test statistic is the Cramer-von 
Mises statistic S n defined in (|2]). The results of Propositions [l] and [2] reformulated above 
then suggest adopting the following goodness-of-fit procedure: 

1. Estimate #0 using an asymptotically linear estimator 9 n . 

2. Compute the test statistic S n = J Rd {¥ n (x)} 2 dF en (x). 

3. Then, for some large integer N, repeat the following steps for every k £ {1, . . . , iV}: 

(a) Generate n i.i.d. random variates Zi,...,Z n with expectation 0, variance 1 
and satisfying J Q °°{P(\Zi\ > x)} 1//2 dx < 00. 

(b) Form an approximate realization of the test statistic under Ho by 

S ( n k) = [ K(x)} 2 dF, n (x), 
JR d 

where F" is defined in (J5]) 

4. An approximate p- value for the test is then given by iV -1 Ylk=i l^™^ — ^n)- 

The results given in the previous subsection imply that, under H : F 6 Ai and 
regularity conditions, the above testing procedure will hold its level asymptotically. In- 
deed, (S n , Sn\ . . . , Sn) converge jointly in distribution to independent copies of the 
same limit and, thus, the approximate p-value computed in Step 4 of the procedure will 
be approximately standard uniform. On the other hand, under Hi : F (jL M. and regular- 
ity conditions, S n tends in probability to infinity while (Sn, • • • , S n N ^) still converges in 
distribution. It follows that the approximate p-value will tend to zero in probability. 

The potential computational advantage of the multiplier goodness-of-fit procedure 
over the parametric bootstrap is best seen when Step 3 above is compared with the 
procedure recalled in the introduction. Roughly speaking, random number generation 
from the fitted distribution and estimation of the parameters from the generated data at 
each parametric bootstrap iteration is replaced by the random generation of n multipliers, 
typically from the standard normal distribution or the uniform distribution on {—1,1}. 

To obtain even faster goodness-of-fits tests, we consider variations of S n and of the 
Kolmogorov-Smirnov statistic T n defined in ^ given by 

S* n = / {¥ n (x)}MF n (x) = - ^{F n (X,)} 2 = ^{F n (X,) - F e „(X 4 )} 2 , (6) 



7 



and 

T* n = max \W n {X i )\ = yfti max - F 9n {Xi)\ , (7) 

ie{l,...,n} ie{l,...,n} 

respectively. When the goodness-of-fit procedure is based on S* (resp. T*), the multiplier 
realizations of Step 3 (b) are thus simply given by 

S* n {k) = [ {W>>(x)} 2 dF n (x) = - J2iK(m 2 fresp. = max K(Xi)\) ■ 



2.3 About the assumptions 



When J- is the class of indicator functions of lower-left orthants in WL d , Assumptions (Al) 
and (A2) concern the existence and the smoothness of the gradient Fg. As discussed for 
instance in Genz and Haeusler (2006), these two conditions are typically satisfied if, for 



any e 0, Fg has a p.d.f. fg, and the function (x, 0) i-> fg(x) is smooth in both x and 0. 



According to Example 19.7 in van der Vaart (1998), Assumption (A3) is satisfied if 



there exists a measurable function m such that 



\(x) -ipo 2 (x)\\ <m(x) ||0i-0: 



2||j 



V0i,0 2 e : ||0 — <5> 1 1 < 5}, 



with P|m| r < oo for some r > 0. The above inequality is satisfied for many M-estimators 
with Pm 2 < oo. Assume furthermore that the functions h- > ipg(x) are continuously 
differentiable and denote by ipg(x) the gradient of ipg(x) with respect to 0. Since \-t ipg(x) 



is continuous, then, as explained in van der Vaart (1998, page 53), the natural candidate 
for m is ip{x) = swp d .^ e _ e ^ <s ||^(x)||. It then remains to verify that Pip 2 < oo. In other 
words, Assumption (A3) is typically satisfied if there exists a square-integrable function 
ip such that ||^(a;)|| < ip(x) when is close to 9q. Furthermore, if (fsl) holds, then, for 
W9-9J <6, 



bg{x)-^g {x)\\ 2 dP{x) < ||0 



m 2 (x)dP(x) 



and therefore, Assumption (A4) holds if Pm 2 < oo. 

Assumption (A5) is related to the estimation of 8q under the null hypothesis, and is 
typically established in the theorems proving the asymptotic normality of M-estimators. 



It follows that (A5) holds under the assumptions of these theorems (see e.g. van der 



Vaart, 1998, Section 5.3). 



3 Finite-sample performance 

In order to study the finite-sample performance of the proposed multiplier goodness-of-fit 
procedure, extensive Monte Carlo experiments were carried out in dimension one, two 
and three, for two- to nine-parameter families of absolutely continuous c.d.f.s. In all 



cases, the multipliers in the goodness-of-fit procedure given in Section are taken from 
the standard normal distribution. 



S 



For a given hypothesized family M. = {Fg : 9 G O}, the estimation of the unknown 
parameter vector 6q was performed using numerical maximum likelihood estimation based 
on the Nelder-Mead optimization algorithm as implemented in the R optim routine ([R] 
Development Core Team, 2011). Method-of-moment estimates of 9q were used as starting 



values. 

The use of maximum likelihood estimation implies, under classical regularity condi- 
tions (see e.g. van der Vaart, 1998, Section 5.5), that, for any 9 G O, the influence 



function ipg appearing in Assumption (A5) is given by 

^) = I ei J j^Hfo o (x)>0}, 

where fg is the probability density function (p.d.f.) associated with Fg , fe (x) is the gra- 
dient of f$ (x) with respect to 9q, and Ig is the Fisher information matrix. Equivalently, 
il)e {x) can be taken as I^ 1 times the gradient of log/e (a;) with respect to 9q. 

From Section 2.2 and in particular (|5|, we see that ipg n is to be computed instead 
of the unknown function ipg . To obtain a more generic implementation, all gradients, 
including Fg n , were computed numerically using Richardson's extrapolation method as 
implemented in the R package numDeriv (Gilbert, 2011). As shall be explained in more 



detail in the forthcoming subsections, this numerical approach was compared with a more 
precise implementation when A4 is the set of c.d.f.s of the multivariate t distribution with 
fixed degrees of freedom (d.f.). In both cases, the matrix I 9n was estimated as the sample 
covariance matrix of the sample fg^X^ / fg^X^, . . . , fg n {X n )/fg n (X n ). 



3.1 Univariate experiments 

In dimension one, four candidate test statistics were considered: numerical approxima- 
tions of S n and T n defined in ^ and (|3]), respectively, and the statistics 5* and T* defined 
in ^ and ([7]), respectively. 

The numerical approximations of S n and T n are based on a uniform grid u\, . . . , u m 
on (0,1) which is transformed as yi = F e _1 (uj), i G {l,...,m}. The goodness-of-fit 
procedure given in Section 2^ is then based on the comparison of 

1 m ( \ 

S n ~- y2{¥ n ( yi )} 2 resp. T n w max |F n (^)| , 

m \ ie{i,...,m} / 

with the set of multiplier realizations 

S f « ~ Y.{K^)Y fresp. T« « . max \K{ yi )\] , ke{l,...,N}, 
m ' V ie{i,...,m} I 

i=i x ' 

where is defined in ([5]). The grid size m was set to 1000 in our experiments. 

In terms of distributions, five two-parameter families of absolutely continuous c.d.f.s 
were considered, namely the normal, t, logistic, gamma and Weibull distributions. The 
d.f. of the t distribution were fixed (to five) to avoid numerical issues that arise when 
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attempting to estimate them (see e.g. Nadarajah and Kotz, 2008, for a review). These 



distributions are abbreviated by N, T5, L, G and W as we continue. 

For data generation, the expectation and the variance of N were set to 10 and 1, 
respectively. For each of the remaining four distributions, its parameters were determined 
as approximate minimizers of the Kullback-Leibler divergence between the p.d.f. of the 
distribution and the p.d.f. of the normal iV(10, 1). The expectation of T5 was thus fixed 
to 10 and its dispersion (scale) parameter to 0.856. The location and scale parameters of 
L were fixed to 10 and 0.572, respectively. The shape and rate parameters of G were set 
to 98.671 and 9.866, respectively, while, for W, the shape and scale parameters were fixed 
to 10.618 and 10.452, respectively. The parametrization of L, G and W used in this work 
corresponds to the implementation of these distributions in the R statistical system. A 
plot of the p.d.f.s of the five data generating distributions is given in Figure [1} 



[Figure 1 about here.] 



For each data generating distribution, 1000 random samples of size n were generated 
for n = 100, 200, 300, 400 and 500. Under each scenario, the five families mentioned 
above were hypothesized. For each of the four test statistics, the proposed multiplier 
goodness-of-fit procedure (abbreviated by MP as we continue) was compared with the 
parametric bootstrap-based goodness-of-fit procedure (abbreviated by PB). All the tests 
were carried out at the 5% level of significance. As the Cramer-von Mises statistics S n 
and 5* consistently outerperformed the Kolmogorov-Smirnov statistics T n and T*, we 
only report the rejection rates of the former in Table [TJ The five horizontal blocks of the 
table correspond to the five data generating distributions whose parameters were given 
previously. The first two columns contain the empirical powers of the Shapiro- Wilk and 
Jarque-Bera tests of univariate normality. The empirical levels of all the tests are in italic 
in the table. As can be noticed, these are, overall, reasonably close to the 5% significance 
level even for small n. 



[Table 1 about here.] 



In terms of rejection rates, the tests based on S n and S* are, overall, very close. From 
a practical perspective, recall that the latter are easier to implement. Notice also that 
the tests based on PB are, overall, more powerful than those based on MP for small 
n. However, as n increases, the difference in rejection rate vanishes in most scenarios. 
These results suggest that the multiplier procedure can be safely used as a large-sample 
alternative to the parametric bootstrap in dimension one. 

To complement the previous study, the levels of the tests were also investigated for 
heavy-tailed and strongly asymmetric distributions. More specifically, the levels of the 
tests were estimated from 1000 random samples of size n generated from the standard t 
distribution with fixed d.f. v e {1, 2, 3,4, 5}, and from the gamma distribution with rate 
parameter 0.5 and shape parameter in {1,2,4,8, 16}. The obtained rejection rates (not 
reported) were found to be reasonably close to the 5% nominal level in all scenarios and 
for all n G {100, 200, 300, 400, 500}. 
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3.2 Bivariate and trivariate experiments 



In the bivariate and trivariate simulations, only the goodness-of-fit procedures based on 
S* and T* were used, and five families of distributions were considered. In addition 
to the multivariate normal (abbreviated by N) and the multivariate t distribution with 
five d.f. (abbreviated by T5), three absolutely continuous families were constructed from 



Sklar (1959)'s representation theorem. The latter result states that any multivariate c.d.f. 
F : R d — > [0, 1] whose marginal c.d.f.s Fx, . . . , F d are continuous can be expressed in terms 
of a unique <i-dimensional copula C as 

F{x) = C{Fx(xx), • • • , F d (x d )}, x = (x u . . . , x d ) G R d . 

A first non-Gaussian family was obtained by taking F±, . . . , F d to be gamma and C to be 
a normal copula, a second choice was to take Fi, . . . , F d to be t with five d.f. and C to be 
a normal copula, while a third family was obtained by taking Fi, . . . , F d normal and C 
to be a Clayton copula. These three families are respectively abbreviated by GN, T5N 
and NC as we continue. Notice that the multivariate normal N is obtained by taking 
Fx, . . . , F d normal and C normal, while T5, the multivariate t with five d.f., is obtained 
by taking Fx, . . . , Fa to be t with five d.f. and C to be a t copula with five d.f. 

In dimension two, all five families have five parameters: two parameters per margin 
and one parameter for the copula. In dimension three, N, T5, GN and T5N have nine 
parameters (two parameters per margin and three correlation coefficients for the copulas), 
while NC has seven parameters (two parameters per margin and one parameter for the 
Clayton copula). 

For data generation, the margins of the distributions N and NC were taken to be 
the N(10, 1), the gamma margins of GN were chosen to have shape and rate parameters 
equal to 98.671 and 9.866, respectively, while the margins of T5 and T5N were set to 
have expectation 10 and dispersion 0.856. In order to study the effect of the dependence 
on the tests, the correlation coefficients of the normal and t copulas were first taken equal 
to 0.309 and then to 0.588, while the parameter of the Clayton copula used to construct 
NC was first taken equal to 0.5 and then to 1.333. This corresponds to requiring that 
the value of Kendall's tau (denoted by r in the tables) for all bivariate margins of the 
copulas is first equal to 0.2 and then to 0.4. 

For N and T5, random number generation and the computation of the p. d.f. and 



the c.d.f. was performed using the excellent mvtnorm R package (Genz, Bretz, Miwa 



Mi, Leisch, Scheipl, and Hothorn 


(Kojadinovic and Yan, 


2010 


) was 



2011). For NC, GN and T5N, the copula package 



The results for dimension two are given in Table [2j The columns SW contain the rejec- 



tion rates of the multivariate extension of the Shapiro- Wilk test proposed by Villasenor- 



Alva and Gonzalez- Estrada (2009) and implemented in the R package mvShapiroTest 



(Gonzalez-Estrada and Villasenor-Alva 2009). Unlike in dimension one, the comparison 



between the multiplier procedure and the parametric bootstrap was carried out only in 
the situation where a bivariate normal distribution is hypothesized. In that case, the 
maximum likelihood estimates are the sample mean and the sample covariance matrix, 
and these were used in the parametric bootstrap to decrease its computational cost. The 
rejection rates of the multiplier procedure and the parametric bootstrap are reported in 
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the columns N-MP and N-PB, respectively. The parametric bootstrap appears clearly 
more powerful than the multiplier but, as in dimension one, the difference in rejection 
rate tends to vanish as n reaches 500. Also, the comparison of the columns SW, N-MP 
and N-PB reveals, without surprise, that the specialized multivariate Shapiro-Wilk test 
is generally more powerful than its two generic competitors. 



[Table 2 about here.] 



By looking at the entries in italic in Table |2j we see that the multiplier procedure is, 
overall, too conservative for smaller n, but that the agreement between the empirical levels 
and the 5% significance level improves as n increases. The effect of stronger dependence 
on the power is variable. For instance, when data are generated from NC, the families 
N, T5N, T5 and GMN are easier to reject for r = 0.4 than for r = 0.2, but, when the 
true distribution is N, it is easier to reject T5N and T5 in the case of weaker dependence. 
Notice finally that, as could have been expected, it is very difficult to distinguish between 
T5 from T5N for these sample sizes. 

The results of the trivariate Monte Carlo simulations are given in Table [3} This 
time, to make the computational cost of the simulation acceptable, only the multiplier 
procedure was used. By looking at the entries in italic, we can see, as in the bivariate 
case, that the tests are, overall, too conservative, but that the empirical levels improve 
as n increases. A comparison with Table [2] also reveals that the rejection rates are higher 
in dimension three than in dimension two, which suggests that the differences between 
the distributions are easier to detect as d increases from 2 to 3. From a more practical 
perspective, we see that the empirical powers approach 100% as n reaches 500 in many 
scenarios under the alternative hypothesis. 



[Table 3 about here.] 



3.3 The case of the multivariate t distribution 



When A4 is the set of c.d.f.s from the multivariate t distribution with fixed d.f., two 
different ways of computing the gradients F e and fe/ fe, £ O, were considered. The 
first one is the generic approach mentioned earlier relying on Richardson's extrapolation 
method as implemented in the R package numDeriv. This numerical method requires 
numerous evaluations of the c.d.f. and the log p. d.f. of the multivariate t as it is based 
on finite differences. For that reason, in the trivariate experiments, the algorithm pa- 
rameter of the pmvt and pmvnorm functions of the mvtnorm package was set to TVPACK 
(Genz, 2004) instead of the default algorithm GenzBretz. Indeed, the latter is based on 
randomized quasi Monte Carlo methods ( Genz and Bretz 1999 , 2002 ) which implies that 
its results depend on random number generation and are therefore not fully reproducible. 
The second approach, which is expected to be more precise, is based, as explained in 
Appendix [bJ on the fact that the gradient fe/ fe can be computed analytically, and on 
the fact that the gradient F e can be expressed in terms of the c.d.f. and the p. d.f. of the 
multivariate t. 
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These two approaches were thoroughly compared and their results were found to 
be very close. As could have been expected, the second more analytical approach is 
much faster as it requires significantly fewer evaluations of the c.d.f. and the p.d.f. of the 
multivariate t. This aspect will be illustrated in the next section. 

Let us finally discuss the estimation of the parameters of the multivariate t with fixed 
d.f. v using the Nelder-Mead algorithm. In dimension three for small v > 3, we noticed 
that the estimation of the nine parameters was extremely sensitive to the choice of the 
starting values, and that the multiplier test tended to be too liberal. Such issues were 
not observed for the other families of c.d.f.s used in the trivariate simulations. Improved 
results were obtained by changing the scale of the optimization using the parscale ar- 
gument of the R optim routine. The latter argument was set to the vector of starting 
values with a guard against values too close to zero. Additional simulations were carried 
out for v — 3, 5, 7, 10, 20 and 30 to study the empirical levels of the multiplier test. 
Table [4] gives such empirical levels when data are generated from bivariate and trivariate 
t distributions with v d.f. fitted from the financial data studied in Section HI As one can 
notice, in dimension two, the empirical levels are reasonably close to the 5% significance 
level. In dimension three however, the test is clearly too liberal for v = 3 and might be 
slightly too liberal for v = 5. We could not determine whether a similar issue also affects 
the parametric bootstrap for computational reasons. We do however believe that the use 



of estimation procedures specifically tailored to the multivariate t (see e.g. Nadarajah 



and Kotz, 2008, Section 3) would solve this problem. 

[Table 4 about here.] 



4 Illustration 

The results of the univariate, bivariate and trivariate experiments whose results were 
partially reported in the previous section hence suggest that the multiplier procedure can 
be safely used as a large-sample alternative to the parametric bootstrap. 

To illustrate the computational advantage of the multiplier procedure over the para- 



metric bootstrap, we consider the financial data analyzed in McNeil, Frey, and Embrechts 



(2005, Chapter 5). These consist of five years of daily log-returns (1996-2000) for the Intel 
(INTC), Microsoft (MSFT) and General Electric (GE) stocks, which gives a trivariate 
sample of size n = 1262. Univariate goodness-of-fit tests for N, T5, T10, T20 and L were 
first applied to the Intel log-return data. The other distributions used in the univariate 
simulations were not considered as their support is (0, oo). Approximate p- values and 
execution times are reported in the first horizontal block of Table |5j The execution times 
are obtained from our R implementations of the multiplier procedure and of the para- 
metric bootstrap. As one can see, the approximate p-values of the multiplier procedure 
and the parametric bootstrap are fairly close, and the execution times are of the same 
order of magnitude. The latter observation is essentially due to the fact that (i) the 
numerical estimation of the parameters of the hypothesized distributions (on which the 
parametric bootstrap heavily relies) is reasonably fast in dimension one, and (ii) the gra- 
dients needed in the multiplier procedure are computed numerically using the numDeriv 
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package as explained previously. Bivariate goodness-of-flt tests for N, NC, T10N, T5, T10 
and T20 were then applied to the bivariate log-returns of the Intel and General Electric 
stocks. Again, the approximate p-values of the multiplier procedure and the parametric 
bootstrap are fairly close. This time, however, the computational advantage of the mul- 
tiplier procedure is obvious, in particular when T10N is hypothesized (1.6 minutes versus 

4.2 hours for the parametric bootstrap on one 2.33 GHz processor). The approximate 
p- values and executions times in italic in Table [5] were obtained using the multiplier pro- 
cedure based on the gradients computed using the more analytical approach described 
in Appendix [Bj Finally, goodness-of-flt tests for N, NC, T10N, T5, T10 and T20 were 
applied to the trivariate log-returns. From the third horizontal block of Table |5j we 
see that the computational advantage of the multiplier is even more pronounced than in 
dimension two. For T10N for instance, the execution of the multiplier procedure took 

4.3 minutes while 16.6 hours were necessary to obtain an approximate p-value using the 
parametric bootstrap. 

[Table 5 about here.] 

The entries in italic in Table [5] show that the execution times of the generic implemen- 
tation of the multiplier procedure based on the numDeriv package can be significantly 
lowered at the expense of more analytical and programming work. Formulas similar to 
or simpler than those given in Appendix [B] could be obtained for all the multivariate 
distributions considered in this work. 

The approximate p- values given in the third horizontal block of Table [5] indicate that 
there is very little evidence against the trivariate distributions T5 and T10. On the basis 
of these tests, we can conclude that a trivariate t distribution whose d.f. are close to 10 
is a plausible model for these financial data. 
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A Proofs of the propositions 

The following lemma will be used in the proofs of Propositions [T] and [2] 

Lemma 1. Let X%, . . . ,X n be a random sample from a distribution P that may or may 
not belong to {Pg : 9 G O}. If Assumptions (A1)-(A4) are satisfied, and if there exists 
8q G O such that 9 n converges in probability to 6q under P, then 

(/ i y G n f - G n ^J P e J, f ^ G'J - G^JA/, f ^ Kf ~ K<Pej) 

converges weakly to 

(/ ^ G P f - Gp^PgJ, f h+ G'pf - G^JA/, / ^ G' P / - G'p^lPgj) 
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in 



'(J 7 )} 3 , where Gp, a P-Brownian bridge, is the weak limit of G n , and G' p 



is an 



independent copy ofGp. 



Proof. From Assumption (A3), {if>g } is P-Donsker. It follows that the class Q obtained as 
the union of J 7 and the p components of if)g is P-Donsker. From the functional multiplier 



central limit theorem (see e.g. Kosorok, 2008, Theorem 10.1 and Corollary 10.3), we then 
have that 



up, 



in 



J 6>o' 



(Q)} 3 - By the continuous mapping theorem, it first follows that 

(G n , G n ip 6o ,G n , G' n ip 6o , G n , G'^g ) ~* {Gg a , Gg ipg , G' 0O , G' e ^ do , I 
in {^(J 7 ) x r} 3 , and then that 

f^GJ- G^lPeJ, f i y G'J - G'^lPeJ, f ^ G"J - G'^PgJ 

converges weakly to 

f^G P f- Gp^ PeJ, f H- G'pf - G'piPlPeJ, f h+ G' P f - G^JA/ 

Now, from Assumption (A3), there exists a 5 > such that H = {ipe : \\9 — 9 \\ < 5} 
is P-Donsker. Let Hk, k G {l,...,p}, be the p component classes of "H. They are 
P-Donsker by definition. 

Next, fix k G {1, . . . ,p} and let g be a function from £°°(T-Lk) x % _ > K defined by 
g(z,if>) = z(ip) — z(ip d0)k ), where ipe ,k is the fcth component of ipg . As noted in |van der 



(9) 



(10) 



in 



Vaart (1998, proof of Lemma 19.24), the set T-Lk is a semimetric space with respect to 



metric L 2 {P) and the function g is continuous with respect to the product semimetric on 
£°°('Hk) x Hk at every point (z,if)) such that if) i-> z(if>) is continuous. 

From Assumption (A4), the fact that 9 n converges to 9q in probability, and Lemma 
2.12 of van der Vaart (1998), we also have that if>e n ,k converges in probability to ipe ,k 
in the space %k equipped with the metric L 2 (P). Since Hk is P-Donsker, G' n ^ G' P 
in £°°(T-Lk)- Also, since 9 n converges to 9 in probability, the probability that if>g nt k is in 
7i k tends to 1. On that event, it follows that (G' n ,if>g n; k) ~» (G' P ,ipg 0) k) in £°°(T-Lk) x 
Since ip k G' P if>k is continuous at every ip k G %k almost surely, the function g is 
continuous at almost every (G' P ,if)g 0t k). By the continuous mapping theorem, we obtain 
that g(G' n ,if)g n}k ) = G' n if)g n)k - G' n if>g 0t k ~> g(G' P ,if)g 0tk ) = 0. Hence, we have that 



' n i>e n ,k ~ G' n Veo,fc = o P (l), k G {1, . . . ,p}. 



(11) 



Similarly, we have that 



G'& 0ntk - G'^ k = o P (l), ke{l,..., P }. (12) 
From Assumption (A2) and the fact 9 n converges to 9 in probability, we also have that 



sup \\P 9 J 



PoofW = o P (l). (13) 
Finally, combining ^ and (10) with (11), (12) and (13), we obtain the desired result. □ 
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We can now prove Proposition [T] and [2] 
Proof of Proposition [TJ, Assumption (A5) implies that 9 n converges to 9$ m proba- 



bility. From Assumption (Al) and Lemma 2.12 of van der Vaart (1998), we then have 
that 

sup \P e J - P e J - (9 n - 9 ) T P e J\ = \\9 n - e \\o Pe (l), 
which in turn is implies that 

sup \Va(PU - PeJ) ~ V^(0 n ~ ) T P e J\ = o P (1), 

since \\Jn(9 n — 9 )\\ = Pg (1) from Assumption (A5) and the continuous mapping the- 
orem. It follows that 



V^(Pn - Pe n ) = v^(Pn - Pe ) - \fc(Pe n - Pe ) 

= v^(P„ - Pe ) - V^(9 n - 9 ) T Pe + R n , 

where sup^gj- \R n f\ = °p 9o (1)- Using Assumption (A5) again, we obtain that 

v^(P n / - PeJ) = Gnf - G^lPeJ + Q n f, f e P, (14) 
where sup^gj- \Q n f\ = op e (1). The result is finally an immediate consequence of Lemma 

Proof of Proposition [2| Write 

<^(P n - P 6n ) = v^(Pn - P) - sfti{Pe n - Pe ) ~ V^(Pe - P). (15) 



The first term converges weakly to Gp in £°°(P). Now, the convergence in distribution 
of Jn(9 n — 9q) implies that 9 n converges in probability to 9 . Hence, proceeding as in the 
proof of Proposition [IJ Assumption (Al) implies that 



sup \Vn(P 9 J - PeJ) - V^(0 n - 9 ) T P e J\ = o P (l), 

IGF 



which in turn implies that the second term in (15) converges weakly in £°°(P). However, 
since P {Pq : 9 G O}, the supremum over P of the third term diverges, which implies 
that 

SU P |v^(Pn/-^„/)|4oO. 

The second part of the proposition is an immediate consequence of Lemma [TJ □ 

B Computational details for the multivariate t 

The p.d.f. of the centered <i-dimensional multivariate t with dispersion matrix £ and v 
d.f. is given by 

t u ,-z(x) = J^J- r 1 + -x T Yr x x , x e R d . (16) 
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Let T„ s denote the corresponding c.d.f. It is easy to verify that the c.d.f. of the multivari- 
ate t with v d.f., expectation vector (pi, . . . ,p d ), dispersions Ai, . . . , A^ and correlation 
matrix E is then given by 

rp / x rp fXl-fJ.1 X d -fl d \ d 

Tp^xix) = T u ^ I — ^ — , • • . , — y — I , i£K. 
The corresponding p. d.f. is thus 

Let us first explain how, for any x G M. d , the gradient of T^s^^x) with respect to 
all the parameters except v can be computed. Let j G {1, . . . ,d}, and, for any x G M. d , 
let T^(x) = dT u ^(x)/dxj. Also, let E_j_j be a (d — 1) x (d — 1) matrix obtained from 
E by removing its jth row and jth column, be a (d — 1) x 1 matrix obtained from 

E by removing its jth row and keeping only its jth column, and = El - •. From 



Nadarajah and Kotz (2005, page 66), if X is standard multivariate t with i/ d.f. and 
correlation matrix E, then, conditionally on Xj = xj, we have that 



3 



is multivariate centered t with v + 1 degrees of freedom and dispersion matrix Aj 
E_j_j — E-jjEj-j. Hence, 



Using the previous expression, it is therefore possible to compute 
and 

dT^^x) _ _Xj - /ii T (i) / zi ~/ii Zd -/^d \ x G md 



d\) 2A| Ai A 

Also, let yOjj be an off-diagonal element of the correlation matrix E. Then 



^ m / \ Fit ^ f ^i-z^i A 



x G M d , 



<9pij dpij 
and, for any x G M d , dT v ^(x)/dpij can be computed using the Plackett formula for the 



multivariate t (see Genz, 2004 Kojadinovic and Yan, 2011, Proposition 1). 
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Let us now discuss the computation, for any x G M d , of the gradient of \ogt v ^^\{x) 
with respect to all the parameters except v. Let j G {1, . . . , d}, and, for any x G M. d , let 
^vt,{ x ) = dt Vj %(x)/dXj. Starting from (16), one obtains that 



[v + d)x T J:- 1 e j 
v + x T H~ 1 x 



x G 



where Cj is the unit vector of lR d whose ith component is 1 if i — j and otherwise. Using 
the previous expression, it is therefore possible to compute 



-l 



I Ai ' • • • ' A d 



x G 



Ai ' • • • ' A rf 



and 



(J) ( xr-ui 



2AJ 



Xj - /ij "^S ^ Ai ' • • • ' A d 
3 L »V I Ax ' • • • ' A d 



Finally, starting again from (|16|), one obtains that 
dt„x(x) I 



\Y\dpij v + x T Y.~ 1 x 



a; G 



x G 



From Seber (2008, Chap. 17) for instance, we have that 



gjSj 
dpij 



2K f . 



and that 



Oil 



where is the cofactor of pi and where rj is the z-th column of £ . 
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Figure 1: Plot of the five p.d.f.s used for data generation in the first univariate exper- 
iment. The normal distribution is the iV(10, 1), while the parameters of the remaining 
four distributions were determined as approximate minimizers of the Kullback-Leibler 
divergence between the distribution and the normal. 
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Table 2: Rejection rate (in %) of the null hypothesis in the bivariate case as observed in 
1000 random samples of size n = 100, 200, 300, 400 and 500. 
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Table 3: Rejection rate (in %) of the null hypothesis in the trivariate case as observed in 
1000 random samples of size n = 100, 200, 300, 400 and 500. 
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Table 4: Rejection rate (in %) of the null hypothesis that the data come from a d- 
dimensional t distribution with fixed d.f. v as observed in 1000 random samples of size 
n = 200, 500, 1000 and 2000 generated from bivariate and trivariate t distributions with 
v d.f. fitted from the financial data used in Section HI 



n d = 2 d = 3 

v 3 5 7 10 20 30 3 5 7 10 20 30 

200 4.2 3.4 3.1 3.5 4.1 4.4 8.6 3.2 3.7 2.2 2.3 3.3 

500 4.7 4.5 5.6 4.7 4.7 4.1 11.8 5.0 3.7 4.5 4.0 3.5 

1000 4.6 4.4 4.6 5.6 5.6 5.0 12.9 7.6 4.6 5.1 4.3 4.7 

2000 6.1 4.2 5.2 5.6 4.5 4.8 11.1 6.6 4.8 5.2 4.9 4.8 
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Table 5: Approximate p-values and executions times on one 2.33 GHz processor of the 
multiplier procedure (MP) and the parametric bootstrap (PB) for the financial data 



analyzed in McNeil et al. (2005, Chapter 5). The approximate p- values and executions 
times in italic were obtained using MP in which the gradients are computed using the 
more analytical approach described in Appendix |B} 



Variables 



Distribution p-value Time in seconds 
MP PB MP PB 



INTC N 

T5 
T10 
T20 
L 

(INTC, GE) N 

NC 
T10N 
T5 
T10 
T10 
T20 

(INTC, GE, MSFT) N 

NC 
T10N 
T5 
T10 
T10 
T20 






000 





000 


9 


5 


14 








066 





077 


9 


3 


43 


2 





538 





520 


9 


3 


40 


1 





034 





017 


9 


3 


40 


4 





461 





405 


9 


2 


18 


7 





000 





000 


79 


2 


1934 


9 





000 





000 


35 


6 


12185 








022 





012 


98 


3 


14969 


8 





043 





026 


8 





2089 


8 





187 





184 


9 





2093 


1 





200 





181 


95 


6 


2238 


1 





003 





004 


9 


2 


2088 








000 





000 


166 


1 


3578 


3 





000 





000 


93 


9 


27954 


9 





000 





000 


256 





59902 


1 





077 





097 


15 


5 


4340 








119 





139 


U 


9 


4459 


4 





133 





149 


252 


8 


4740 


8 





004 





021 


U 


4 


4476 


4 
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